A Refined Analysis of LSH for Well-dispersed Data Points
نویسندگان
چکیده
Near neighbor problems are fundamental in algorithms for high-dimensional Euclidean spaces. While classical approaches suffer from the curse of dimensionality, locality sensitive hashing (LSH) can effectively solve α-approximate r-near neighbor problem, and has been proven to be optimal in the worst case. However, for real-world data sets, LSH can naturally benefit from well-dispersed data and low doubling dimension, leading to significantly improved performance. In this paper, we address this issue and propose a refined analyses for running time of approximating near neighbors queries via LSH. We characterize dispersion of data using Nβ, the number of βr-near pairs among the data points. Combined with optimal data-oblivious LSH scheme, we get a O ( (
منابع مشابه
A new method to determine a well-dispersed subsets of non-dominated vectors for MOMILP problem
Multi-objective optimization is the simultaneous consideration of two or more objective functions that are completely or partially inconflict with each other. The optimality of such optimizations is largely defined through the Pareto optimality. Multiple objective integer linear programs (MOILP) are special cases of multiple criteria decision making problems. Numerous algorithms have been desig...
متن کاملFast Information-Theoretic Agglomerative Co-clustering
Our algorithm iteratively merges those clusters whose merge yields a lower objective cost. However, operations such as finding nearest neighbors or closest pair of clusters are expensive, especially in high dimensions. To quickly find highly similar clusters to be merged, we exploit the Locality-Sensitive Hashing (LSH) technique, which we briefly describe in this section. Simply put, LSH [2] is...
متن کاملHigh-dimensional Spherical Range Reporting by Output-Sensitive Multi-Probing LSH
We present a data structure for spherical range reporting on a point set S, i.e., reporting all points in S that lie within radius r of a given query point q (with a small probability of error). Our solution builds upon the Locality-Sensitive Hashing (LSH) framework of Indyk and Motwani, which represents the asymptotically best solutions to near neighbor problems in high dimensions. While tradi...
متن کاملReverse Nearest Neighbors Search in High Dimensions using Locality-Sensitive Hashing
We investigate the problem of finding reverse nearest neighbors efficiently. Although provably good solutions exist for this problem in low or fixed dimensions, to this date the methods proposed in high dimensions are mostly heuristic. We introduce a method that is both provably correct and efficient in all dimensions, based on a reduction of the problem to one instance of εnearest neighbor sea...
متن کاملHybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space
We study the r-near neighbors reporting problem (rNNR) (or spherical range reporting), i.e., reporting all points in a high-dimensional point set S that lie within a radius r of a given query point. This problem has played building block roles in finding near-duplicate web pages, solving k-diverse near neighbor search and content-based image retrieval problems. Our approach builds upon the loca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017